111 research outputs found
ORB-SLAM: A Versatile and Accurate Monocular SLAM System
This paper presents ORB-SLAM, a feature-based monocular simultaneous localization and mapping (SLAM) system that operates in real time, in small and large indoor and outdoor environments. The system is robust to severe motion clutter, allows wide baseline loop closing and relocalization, and includes full automatic initialization. Building on excellent algorithms of recent years, we designed from scratch a novel system that uses the same features for all SLAM tasks: tracking, mapping, relocalization, and loop closing. A survival of the fittest strategy that selects the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, allowing lifelong operation. We present an exhaustive evaluation in 27 sequences from the most popular datasets. ORB-SLAM achieves unprecedented performance with respect to other state-of-the-art monocular SLAM approaches. For the benefit of the community, we make the source code public
RIDI: Robust IMU Double Integration
This paper proposes a novel data-driven approach for inertial navigation,
which learns to estimate trajectories of natural human motions just from an
inertial measurement unit (IMU) in every smartphone. The key observation is
that human motions are repetitive and consist of a few major modes (e.g.,
standing, walking, or turning). Our algorithm regresses a velocity vector from
the history of linear accelerations and angular velocities, then corrects
low-frequency bias in the linear accelerations, which are integrated twice to
estimate positions. We have acquired training data with ground-truth motions
across multiple human subjects and multiple phone placements (e.g., in a bag or
a hand). The qualitatively and quantitatively evaluations have demonstrated
that our algorithm has surprisingly shown comparable results to full Visual
Inertial navigation. To our knowledge, this paper is the first to integrate
sophisticated machine learning techniques with inertial navigation, potentially
opening up a new line of research in the domain of data-driven inertial
navigation. We will publicly share our code and data to facilitate further
research
Integrating Simulink, OpenVX, and ROS for Model-Based Design of Embedded Vision Applications
OpenVX is increasingly gaining consensus as standard platform to develop portable, optimized and power-efficient embedded vision applications. Nevertheless, adopting OpenVX for rapid prototyping, early algorithm parametrization and validation of complex embedded applications is a very challenging task. This paper presents a comprehensive framework that integrates Simulink, OpenVX, and ROS for model-based design of embedded vision applications. The framework allows applying Matlab-Simulink for the model-based design, parametrization, and validation of computer vision applications. Then, it allows for the automatic synthesis of the application model into an OpenVX description for the hardware and constraints-aware application tuning. Finally, the methodology allows integrating the OpenVX application with Robot Operating System (ROS), which is the de-facto reference standard for developing robotic software applications. The OpenVX-ROS interface allows co-simulating and parametrizing the application by considering the actual robotic environment and the application reuse in any ROS-compliant system. Experimental results have been conducted with two real case studies: An application for digital image stabilization and the ORB descriptor for simultaneous localization and mapping (SLAM), which have been developed through Simulink and, then, automatically synthesized into OpenVX-VisionWorks code for an NVIDIA Jetson TX2 boar
A scalable FPGA-based architecture for depth estimation in SLAM
The current state of the art of Simultaneous Localisation and Mapping, or SLAM, on low power embedded systems is about sparse localisation and mapping with low resolution results in the name of efficiency. Meanwhile, research in this field has provided many advances for information rich processing and semantic understanding, combined with high computational requirements for real-time processing. This work provides a solution to bridging this gap, in the form of a scalable SLAM-specific architecture for depth estimation for direct semi-dense SLAM. Targeting an off-the-shelf FPGA-SoC this accelerator architecture achieves a rate of more than 60 mapped frames/sec at a resolution of 640×480 achieving performance on par to a highly-optimised parallel implementation on a high-end desktop CPU with an order of magnitude improved power consumption. Furthermore, the developed architecture is combined with our previous work for the task of tracking, to form the first complete accelerator for semi-dense SLAM on FPGAs, establishing the state of the art in the area of embedded low-power systems
Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving
We propose a stereo vision-based approach for tracking the camera ego-motion
and 3D semantic objects in dynamic autonomous driving scenarios. Instead of
directly regressing the 3D bounding box using end-to-end approaches, we propose
to use the easy-to-labeled 2D detection and discrete viewpoint classification
together with a light-weight semantic inference method to obtain rough 3D
object measurements. Based on the object-aware-aided camera pose tracking which
is robust in dynamic environments, in combination with our novel dynamic object
bundle adjustment (BA) approach to fuse temporal sparse feature correspondences
and the semantic 3D measurement model, we obtain 3D object pose, velocity and
anchored dynamic point cloud estimation with instance accuracy and temporal
consistency. The performance of our proposed method is demonstrated in diverse
scenarios. Both the ego-motion estimation and object localization are compared
with the state-of-of-the-art solutions.Comment: 14 pages, 9 figures, eccv201
Direct Sparse Odometry with Rolling Shutter
Neglecting the effects of rolling-shutter cameras for visual odometry (VO)
severely degrades accuracy and robustness. In this paper, we propose a novel
direct monocular VO method that incorporates a rolling-shutter model. Our
approach extends direct sparse odometry which performs direct bundle adjustment
of a set of recent keyframe poses and the depths of a sparse set of image
points. We estimate the velocity at each keyframe and impose a
constant-velocity prior for the optimization. In this way, we obtain a near
real-time, accurate direct VO method. Our approach achieves improved results on
challenging rolling-shutter sequences over state-of-the-art global-shutter VO
DELTAS: Depth Estimation by Learning Triangulation And densification of Sparse points
Multi-view stereo (MVS) is the golden mean between the accuracy of active
depth sensing and the practicality of monocular depth estimation. Cost volume
based approaches employing 3D convolutional neural networks (CNNs) have
considerably improved the accuracy of MVS systems. However, this accuracy comes
at a high computational cost which impedes practical adoption. Distinct from
cost volume approaches, we propose an efficient depth estimation approach by
first (a) detecting and evaluating descriptors for interest points, then (b)
learning to match and triangulate a small set of interest points, and finally
(c) densifying this sparse set of 3D points using CNNs. An end-to-end network
efficiently performs all three steps within a deep learning framework and
trained with intermediate 2D image and 3D geometric supervision, along with
depth supervision. Crucially, our first step complements pose estimation using
interest point detection and descriptor learning. We demonstrate
state-of-the-art results on depth estimation with lower compute for different
scene lengths. Furthermore, our method generalizes to newer environments and
the descriptors output by our network compare favorably to strong baselines.
Code is available at https://github.com/magicleap/DELTASComment: ECCV 202
Vehicle Trajectories from Unlabeled Data through Iterative Plane Registration
One of the most complex aspects of autonomous driving concerns understanding the surrounding environment. In particular, the interest falls on detecting which agents are populating it and how they are moving. The capacity to predict how these may act in the near future would allow an autonomous vehicle to safely plan its trajectory, minimizing the risks for itself and others. In this work we propose an automatic trajectory annotation method exploiting an Iterative Plane Registration algorithm based on homographies and semantic segmentations. The output of our technique is a set of holistic trajectories (past-present-future) paired with a single image context, useful to train a predictive model
Vision-Depth Landmarks and Inertial Fusion for Navigation in Degraded Visual Environments
This paper proposes a method for tight fusion of visual, depth and inertial
data in order to extend robotic capabilities for navigation in GPS-denied,
poorly illuminated, and texture-less environments. Visual and depth information
are fused at the feature detection and descriptor extraction levels to augment
one sensing modality with the other. These multimodal features are then further
integrated with inertial sensor cues using an extended Kalman filter to
estimate the robot pose, sensor bias terms, and landmark positions
simultaneously as part of the filter state. As demonstrated through a set of
hand-held and Micro Aerial Vehicle experiments, the proposed algorithm is shown
to perform reliably in challenging visually-degraded environments using RGB-D
information from a lightweight and low-cost sensor and data from an IMU.Comment: 11 pages, 6 figures, Published in International Symposium on Visual
Computing (ISVC) 201
- …